OBA2: An Onion approach to Binary code Authorship Attribution
نویسندگان
چکیده
A critical aspect of malware forensics is authorship analysis. The successful outcome of such analysis is usually determined by the reverse engineer’s skills and by the volume and complexity of the code under analysis. To assist reverse engineers in such a tedious and error-prone task, it is desirable to develop reliable and automated tools for supporting the practice of malware authorship attribution. In a recent work, machine learning was used to rank and select syntax-based features such as n-grams and flow graphs. The experimental results showed that the top ranked features were unique for each author, which was regarded as an evidence that those features capture the author’s programming styles. In this paper, however, we show that the uniqueness of features does not necessarily correspond to authorship. Specifically, our analysis demonstrates that many “unique” features selected using this method are clearly unrelated to the authors’ programming styles, for example, unique IDs or random but unique function names generated by the compiler; furthermore, the overall accuracy is generally unsatisfactory. Motivated by this discovery, we propose a layered Onion Approach for Binary Authorship Attribution called OBA2. The novelty of our approach lies in the three complementary layers: preprocessing, syntax-based attribution, and semantic-based attribution. Experiments show that our method produces results that not only are more accurate but have a meaningful connection to the authors’ styles. a 2014 The Author. Published by Elsevier Ltd on behalf of DFRWS. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).
منابع مشابه
Corrigendum to 'OBA2: An Onion approach to Binary code Authorship Attribution' [Digit Investig 11 (2014) S94-S103]
The authors state that, Algorithms 1 and 2 (on page 5), together with their explanations, were not correctly cited in the original article. The Algorithms are borrowed from the authors previously published work (which is a Master thesis co-supervised by Dr. Mourad Debbabi and Dr. Benjamin Fung). The correct citation for Algorithms 1 and 2 is listed below; Farhadi, MR. Assembly Code Clone Detect...
متن کاملWhen Coding Style Survives Compilation: De-anonymizing Programmers from Executable Binaries
The ability to identify authors of computer programs based on their coding style is a direct threat to the privacy and anonymity of programmers. Previous work has examined attribution of authors from both source code and compiled binaries, and found that while source code can be attributed with very high accuracy, the attribution of executable binary appears to be much more difficult. Many pote...
متن کاملWho Wrote This Code? Identifying the Authors of Program Binaries
Program authorship attribution—identifying a programmer based on stylistic characteristics of code—has practical implications for detecting software theft, digital forensics, and malware analysis. Authorship attribution is challenging in these domains where usually only binary code is available; existing source code-based approaches to attribution have left unclear whether and to what extent pr...
متن کاملOn the Feasibility of Malware Authorship Attribution
There are many occasions in which the security community is interested to discover the authorship of malware binaries, either for digital forensics analysis of malware corpora or for thwarting live threats of malware invasion. Such a discovery of authorship might be possible due to stylistic features inherent to software codes written by human programmers. Existing studies of authorship attribu...
متن کاملAuthorship and Plagiarism Detection Using Binary BOW Features
Identifying writing style shifts and variations are fundamental capabilities when addressing authorship related tasks. In this work we examine a simplified approach for unsupervised authorship and plagiarism detection which is based on binary bag of words representation. We evaluate our approach using PAN-2012 Authorship Attribution challenge data, which includes both open/closed class authorsh...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Digital Investigation
دوره 11 شماره
صفحات -
تاریخ انتشار 2014